This analysis addresses feedback from Francis regarding the temporal alignment of weather predictors with butterfly responses. The original daily-level analysis used fixed 6am-6pm weather windows, which Francis identified as having a temporal logic issue:
“All of the metrics for wind, temperature and light, would need to be re-calculated for 24 hour periods that begin at the time of the highest count.”
1.1 Francis’s Key Points
Temporal alignment: Butterflies can only respond to weather after it occurs
Biological timing: If max count occurred at 2pm on day t-1, the relevant weather window should start at 2pm, not 6am
Roosting decisions: Weather from max count through sunset determines whether butterflies abandon the roost
We evaluate three butterfly difference metrics (max, 95th percentile, and top 3 mean) with three transformations each (untransformed, square root, and square) to determine which best approximates normality.
**Best transformation for normality:** butterfly_diff_sqrt
(W = 0.9893 , p = 0.6368 )
Code
# Create histogramspar(mfrow =c(3, 3), mar =c(4, 4, 3, 1))for (var in response_vars) { var_data <- data_filtered[[var]] var_data <- var_data[!is.na(var_data)]# Histogramhist(var_data, breaks =30, probability =TRUE,main =sprintf("%s\n(W=%.3f, p=%.4f)", var, normality_tests$shapiro_stat[normality_tests$variable == var], normality_tests$p_value[normality_tests$variable == var]),xlab ="Value", col ="steelblue", border ="black")# Overlay normal distribution x_seq <-seq(min(var_data), max(var_data), length.out =100)lines(x_seq, dnorm(x_seq, mean(var_data), sd(var_data)),col ="red", lwd =2)# Add gridgrid()}
4.1 Key Findings
Square root transformations perform best, with all three failing to reject normality (p > 0.5)
Untransformed differences show significant departures from normality (p < 0.0001)
Square transformations severely violate normality with extreme kurtosis (>9)
butterfly_diff_sqrt has the highest Shapiro-Wilk statistic (W = 0.9893, p = 0.6368)
Square root transformations achieve near-zero skewness and kurtosis close to 0
Selected response variable for modeling:butterfly_diff_sqrt (square root of max butterflies difference)
5 Candidate Predictor Variables
Following the original analysis structure, we consider all potential weather and baseline metrics, then use correlation analysis to select final predictors.
var_descriptions <-tribble(~Variable, ~Description, ~Type,"butterflies_95th_percentile_t_1", "95th percentile count on previous day (baseline)", "Baseline","max_butterflies_t_1", "Maximum count on previous day", "Baseline","butterflies_top3_mean_t_1", "Mean of top 3 counts on previous day", "Baseline","temp_max", "Max temp from max count to sunset (includes overnight)", "Temperature","temp_min", "Min temp from max count to sunset (includes overnight)", "Temperature","temp_mean", "Mean temp from max count to sunset", "Temperature","temp_at_max_count_t_1", "Temperature when max count occurred", "Temperature","hours_above_15C", "Hours ≥15°C in window", "Temperature","degree_hours_above_15C", "Cumulative degree-hours >15°C", "Temperature","wind_avg_sustained", "Mean sustained wind speed in window", "Wind","wind_max_gust", "Maximum gust in window (includes overnight)", "Wind","wind_gust_sum", "Sum of all gust measurements", "Wind","wind_gust_sum_above_2ms", "Sum of gusts >2 m/s", "Wind","wind_gust_hours", "Gust-hours (integral)", "Wind","wind_minutes_above_2ms", "Minutes with wind ≥2 m/s", "Wind","wind_gust_sd", "SD of gust speeds (variability)", "Wind","wind_mode_gust", "Most frequent gust speed", "Wind","sum_butterflies_direct_sun", "Sum of butterflies in direct sun (entire lag window)", "Sun","lag_duration_hours", "Window length in hours", "Window")kable(var_descriptions, caption ="Candidate predictor variables")
Candidate predictor variables
Variable
Description
Type
butterflies_95th_percentile_t_1
95th percentile count on previous day (baseline)
Baseline
max_butterflies_t_1
Maximum count on previous day
Baseline
butterflies_top3_mean_t_1
Mean of top 3 counts on previous day
Baseline
temp_max
Max temp from max count to sunset (includes overnight)
Temperature
temp_min
Min temp from max count to sunset (includes overnight)
Temperature
temp_mean
Mean temp from max count to sunset
Temperature
temp_at_max_count_t_1
Temperature when max count occurred
Temperature
hours_above_15C
Hours <U+2265>15<U+00B0>C in window
Temperature
degree_hours_above_15C
Cumulative degree-hours >15<U+00B0>C
Temperature
wind_avg_sustained
Mean sustained wind speed in window
Wind
wind_max_gust
Maximum gust in window (includes overnight)
Wind
wind_gust_sum
Sum of all gust measurements
Wind
wind_gust_sum_above_2ms
Sum of gusts >2 m/s
Wind
wind_gust_hours
Gust-hours (integral)
Wind
wind_minutes_above_2ms
Minutes with wind <U+2265>2 m/s
Wind
wind_gust_sd
SD of gust speeds (variability)
Wind
wind_mode_gust
Most frequent gust speed
Wind
sum_butterflies_direct_sun
Sum of butterflies in direct sun (entire lag window)
Sun
lag_duration_hours
Window length in hours
Window
6 Data Quality Assessment
Code
cat("Data completeness summary:\n")
Data completeness summary:
Code
cat("- Observations with metrics_complete = 0:", sum(daily_data$metrics_complete ==0), "\n")
- Observations with metrics_complete = 0: 2
Code
cat("- Observations with wind_data_coverage < 0.5:", sum(daily_data$wind_data_coverage <0.5), "\n")
- Observations with wind_data_coverage < 0.5: 5
Code
cat("- Mean temperature coverage:", round(mean(daily_data$temp_data_coverage), 3), "\n")
- Mean temperature coverage: 1
Code
cat("- Mean wind coverage:", round(mean(daily_data$wind_data_coverage), 3), "\n")
- Mean wind coverage: 0.952
Code
cat("- Mean butterfly coverage:", round(mean(daily_data$butterfly_data_coverage), 3), "\n\n")
- Mean butterfly coverage: 0.989
Code
# Show observations with low wind coveragelow_wind <- daily_data %>%filter(wind_data_coverage <0.5) %>%select(deployment_id, date_t_1, date_t, wind_data_coverage, butterflies_95th_percentile_t_1, butterflies_95th_percentile_t)if (nrow(low_wind) >0) {cat("Observations with <50% wind coverage:\n")kable(low_wind,caption ="Low wind data coverage (likely wind database gaps, not zero-butterfly exclusions)",digits =3)} else {cat("All observations have adequate wind coverage\n")}
Observations with <50% wind coverage:
Low wind data coverage (likely wind database gaps, not zero-butterfly exclusions)
deployment_id
date_t_1
date_t
wind_data_coverage
butterflies_95th_percentile_t_1
butterflies_95th_percentile_t
SC10
2024-01-27
2024-01-28
0.185
0.0
6.70
SC10
2024-01-28
2024-01-29
0.043
6.7
14.70
SC10
2024-01-29
2024-01-30
0.000
14.7
9.00
SC10
2024-01-30
2024-01-31
0.000
9.0
7.95
SC9
2024-01-28
2024-01-29
0.040
18.3
1.00
Code
# Display the diagnostic plot created earlierknitr::include_graphics(here("analysis", "dynamic_window_analysis", "data_quality_diagnostics.png"))
Note: 5 observations (all from late January 2024) have limited wind data coverage. These appear to be gaps in the wind database rather than issues with the data preparation logic. All observations have butterflies present.
6.1 Filtering Low Quality Observations
Code
# Filter observations with metrics_complete < 0.95low_quality <- daily_data %>%filter(metrics_complete <0.95)cat("Observations to exclude (metrics_complete < 0.95):", nrow(low_quality), "\n")
Observations to exclude (metrics_complete < 0.95): 7
Code
cat("Percentage of dataset:", round(nrow(low_quality) /nrow(daily_data) *100, 1), "%\n\n")
Percentage of dataset: 6.8 %
Code
if (nrow(low_quality) >0) {cat("Excluded observations have:\n")cat("- Mean butterflies_95th_t_1:", round(mean(low_quality$butterflies_95th_percentile_t_1), 1), "\n")cat("- Mean butterflies_95th_t:", round(mean(low_quality$butterflies_95th_percentile_t), 1), "\n")cat("- Mean |butterfly_diff_95th|:", round(mean(abs(low_quality$butterfly_diff_95th)), 1), "\n")cat("- Mean metrics_complete:", round(mean(low_quality$metrics_complete), 3), "\n\n")}
Excluded observations have:
- Mean butterflies_95th_t_1: 22.9
- Mean butterflies_95th_t: 19.2
- Mean |butterfly_diff_95th|: 7.9
- Mean metrics_complete: 0.45
cat("- Mean butterflies_95th_t_1:", round(mean(daily_data$butterflies_95th_percentile_t_1), 1), "\n")
- Mean butterflies_95th_t_1: 123.1
Code
cat("- Mean butterflies_95th_t:", round(mean(daily_data$butterflies_95th_percentile_t), 1), "\n")
- Mean butterflies_95th_t: 113.8
Code
cat("- Mean |butterfly_diff_95th|:", round(mean(abs(daily_data$butterfly_diff_95th)), 1), "\n")
- Mean |butterfly_diff_95th|: 58
Code
cat("- Mean metrics_complete:", round(mean(daily_data$metrics_complete), 3), "\n")
- Mean metrics_complete: 0.996
Rationale for exclusion: The 7 excluded observations (6.8% of dataset) have relatively small butterfly counts (mean 95th percentile: 22.9 vs 123.1 for kept data) and incomplete weather data that could bias model estimates.
7 Correlation Matrix: All Candidate Predictors
This correlation matrix shows all potential fixed effects to help identify multicollinearity and guide variable selection.
Code
# Select all candidate predictors that exist in the datasetavailable_predictors <- all_predictors[all_predictors %in%names(daily_data)]# Create correlation matrixpredictor_data <- daily_data %>%select(all_of(available_predictors)) %>%na.omit()cor_matrix <-cor(predictor_data)# Create correlation plotcorrplot(cor_matrix,method ="color",type ="upper",order ="original",tl.cex =0.8,tl.col ="black",tl.srt =45,addCoef.col ="black",number.cex =0.6,title ="Correlation Matrix: All Candidate Predictors (Sunset Window)",mar =c(0, 0, 2, 0))
Code
# Print correlation matrix as tablekable(round(cor_matrix, 3),caption ="Correlation coefficients for all candidate predictors")
Correlation coefficients for all candidate predictors
butterflies_95th_percentile_t_1
max_butterflies_t_1
butterflies_top3_mean_t_1
temp_max
temp_min
temp_mean
temp_at_max_count_t_1
hours_above_15C
degree_hours_above_15C
wind_avg_sustained
wind_max_gust
wind_gust_sum
wind_gust_sum_above_2ms
wind_gust_hours
wind_minutes_above_2ms
wind_gust_sd
wind_mode_gust
sum_butterflies_direct_sun
lag_duration_hours
butterflies_95th_percentile_t_1
1.000
0.968
0.995
-0.027
-0.294
-0.219
-0.124
-0.163
-0.014
-0.155
-0.306
-0.194
-0.220
-0.194
-0.233
-0.240
-0.110
0.381
-0.082
max_butterflies_t_1
0.968
1.000
0.982
-0.016
-0.293
-0.215
-0.076
-0.160
-0.014
-0.151
-0.307
-0.198
-0.207
-0.198
-0.216
-0.204
-0.129
0.480
-0.126
butterflies_top3_mean_t_1
0.995
0.982
1.000
-0.038
-0.302
-0.237
-0.118
-0.177
-0.028
-0.154
-0.310
-0.196
-0.219
-0.196
-0.226
-0.233
-0.117
0.406
-0.093
temp_max
-0.027
-0.016
-0.038
1.000
0.056
0.711
0.139
0.606
0.919
-0.476
-0.218
-0.388
-0.317
-0.388
-0.364
-0.186
-0.427
-0.012
0.157
temp_min
-0.294
-0.293
-0.302
0.056
1.000
0.639
0.366
0.365
0.070
0.107
0.232
0.168
0.156
0.168
0.167
0.118
0.240
-0.330
0.086
temp_mean
-0.219
-0.215
-0.237
0.711
0.639
1.000
0.230
0.767
0.754
-0.178
0.064
-0.067
-0.030
-0.067
-0.059
0.016
-0.084
-0.177
0.297
temp_at_max_count_t_1
-0.124
-0.076
-0.118
0.139
0.366
0.230
1.000
0.138
0.039
-0.140
-0.105
-0.229
-0.177
-0.229
-0.177
-0.058
-0.101
-0.085
-0.574
hours_above_15C
-0.163
-0.160
-0.177
0.606
0.365
0.767
0.138
1.000
0.694
-0.121
0.058
-0.006
0.003
-0.006
0.002
0.029
-0.096
-0.087
0.374
degree_hours_above_15C
-0.014
-0.014
-0.028
0.919
0.070
0.754
0.039
0.694
1.000
-0.396
-0.196
-0.309
-0.269
-0.309
-0.294
-0.169
-0.340
-0.037
0.258
wind_avg_sustained
-0.155
-0.151
-0.154
-0.476
0.107
-0.178
-0.140
-0.121
-0.396
1.000
0.740
0.956
0.917
0.956
0.938
0.586
0.824
0.096
0.142
wind_max_gust
-0.306
-0.307
-0.310
-0.218
0.232
0.064
-0.105
0.058
-0.196
0.740
1.000
0.824
0.830
0.824
0.767
0.830
0.539
-0.159
0.314
wind_gust_sum
-0.194
-0.198
-0.196
-0.388
0.168
-0.067
-0.229
-0.006
-0.309
0.956
0.824
1.000
0.953
1.000
0.963
0.648
0.800
0.006
0.372
wind_gust_sum_above_2ms
-0.220
-0.207
-0.219
-0.317
0.156
-0.030
-0.177
0.003
-0.269
0.917
0.830
0.953
1.000
0.953
0.962
0.738
0.683
0.033
0.299
wind_gust_hours
-0.194
-0.198
-0.196
-0.388
0.168
-0.067
-0.229
-0.006
-0.309
0.956
0.824
1.000
0.953
1.000
0.963
0.648
0.800
0.006
0.372
wind_minutes_above_2ms
-0.233
-0.216
-0.226
-0.364
0.167
-0.059
-0.177
0.002
-0.294
0.938
0.767
0.963
0.962
0.963
1.000
0.630
0.743
0.053
0.300
wind_gust_sd
-0.240
-0.204
-0.233
-0.186
0.118
0.016
-0.058
0.029
-0.169
0.586
0.830
0.648
0.738
0.648
0.630
1.000
0.226
-0.022
0.211
wind_mode_gust
-0.110
-0.129
-0.117
-0.427
0.240
-0.084
-0.101
-0.096
-0.340
0.824
0.539
0.800
0.683
0.800
0.743
0.226
1.000
-0.039
0.153
sum_butterflies_direct_sun
0.381
0.480
0.406
-0.012
-0.330
-0.177
-0.085
-0.087
-0.037
0.096
-0.159
0.006
0.033
0.006
0.053
-0.022
-0.039
1.000
-0.116
lag_duration_hours
-0.082
-0.126
-0.093
0.157
0.086
0.297
-0.574
0.374
0.258
0.142
0.314
0.372
0.299
0.372
0.300
0.211
0.153
-0.116
1.000
7.1 Highly Correlated Pairs (|r| > 0.7)
Code
# Find pairs with high correlationhigh_cor_pairs <-data.frame()for (i in1:(nrow(cor_matrix)-1)) {for (j in (i+1):ncol(cor_matrix)) { cor_val <- cor_matrix[i, j]if (abs(cor_val) >0.7) { high_cor_pairs <-rbind( high_cor_pairs,data.frame(Var1 =rownames(cor_matrix)[i],Var2 =colnames(cor_matrix)[j],Correlation =round(cor_val, 3) ) ) } }}if (nrow(high_cor_pairs) >0) { high_cor_pairs <- high_cor_pairs %>%arrange(desc(abs(Correlation)))kable(high_cor_pairs,caption ="Predictor pairs with |r| > 0.7 (potential multicollinearity issues)")} else {cat("No predictor pairs with |r| > 0.7\n")}
Predictor pairs with |r| > 0.7 (potential multicollinearity issues)
Var1
Var2
Correlation
wind_gust_sum
wind_gust_hours
1.000
butterflies_95th_percentile_t_1
butterflies_top3_mean_t_1
0.995
max_butterflies_t_1
butterflies_top3_mean_t_1
0.982
butterflies_95th_percentile_t_1
max_butterflies_t_1
0.968
wind_gust_sum
wind_minutes_above_2ms
0.963
wind_gust_hours
wind_minutes_above_2ms
0.963
wind_gust_sum_above_2ms
wind_minutes_above_2ms
0.962
wind_avg_sustained
wind_gust_sum
0.956
wind_avg_sustained
wind_gust_hours
0.956
wind_gust_sum
wind_gust_sum_above_2ms
0.953
wind_gust_sum_above_2ms
wind_gust_hours
0.953
wind_avg_sustained
wind_minutes_above_2ms
0.938
temp_max
degree_hours_above_15C
0.919
wind_avg_sustained
wind_gust_sum_above_2ms
0.917
wind_max_gust
wind_gust_sum_above_2ms
0.830
wind_max_gust
wind_gust_sd
0.830
wind_avg_sustained
wind_mode_gust
0.824
wind_max_gust
wind_gust_sum
0.824
wind_max_gust
wind_gust_hours
0.824
wind_gust_sum
wind_mode_gust
0.800
wind_gust_hours
wind_mode_gust
0.800
temp_mean
hours_above_15C
0.767
wind_max_gust
wind_minutes_above_2ms
0.767
temp_mean
degree_hours_above_15C
0.754
wind_minutes_above_2ms
wind_mode_gust
0.743
wind_avg_sustained
wind_max_gust
0.740
wind_gust_sum_above_2ms
wind_gust_sd
0.738
temp_max
temp_mean
0.711
8 Model Building Strategy
Based on correlation analysis and biological relevance, we implement a comprehensive model comparison approach:
8.1 Selected Fixed Effects
Always included (in all models): - max_butterflies_t_1: Baseline count (tested both as smooth and linear) - lag_duration_hours: Window duration control (tested both as smooth and linear)
'gamm' based fit - care required with interpretation.
Checks based on working residuals may be misleading.
Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.
k' edf k-index p-value
s(max_butterflies_t_1) 4.00 1.00 1.03 0.56
s(lag_duration_hours) 4.00 1.00 1.17 0.93
s(temp_min) 9.00 1.00 0.92 0.15
s(temp_max) 9.00 2.51 1.14 0.93
s(temp_at_max_count_t_1) 9.00 1.00 1.01 0.55
s(wind_max_gust) 9.00 1.65 1.02 0.56
s(sum_butterflies_direct_sun) 9.00 1.78 1.09 0.79
ti(temp_min,wind_max_gust) 16.00 2.14 1.11 0.87
ti(temp_max,wind_max_gust) 16.00 1.00 1.02 0.58
ti(temp_min,sum_butterflies_direct_sun) 16.00 3.16 1.11 0.84
s(deployment_id) 6.00 2.05 NA NA